feat: Added a new trait to expose SchemaProvider#1621
feat: Added a new trait to expose SchemaProvider#1621parmesant wants to merge 2 commits intoparseablehq:mainfrom
Conversation
WalkthroughThis PR adds crate-root re-exports (arrow, datafusion-proto, catalog/utils aliases), introduces a global overridable ChangesSchema provider, query/session, metrics, and streaming helpers
Sequence Diagram(s)sequenceDiagram
participant Client
participant Query
participant SCHEMA_PROVIDER
participant ObjectStorage
participant SessionState
Client->>Query: create_session_context(tenant_id, storage?)
alt SCHEMA_PROVIDER set
Query->>SCHEMA_PROVIDER: new_provider(storage?, tenant_id)
SCHEMA_PROVIDER->>Query: Box<dyn SchemaProvider>
else fallback
Query->>Query: instantiate GlobalSchemaProvider
Query-->>Query: SchemaProvider
end
Query->>SessionState: register_schema(schema_provider.into())
Query->>SessionState: build (append ADDITIONAL_PHYSICAL_OPTIMIZER_RULES)
Client->>Query: execute(query)
Query->>SessionState: create_physical_plan(...)
Query->>ObjectStorage: read manifests (hot-tier)
ObjectStorage-->>Query: manifest files
Query->>metrics: increment_files_scanned_in_hottier_by_date(...)
Query->>utils/arrow: into_flight_data_stream(stream)
utils/arrow-->>Client: Flight DoGet stream
Estimated code review effort🎯 4 (Complex) | ⏱️ ~40 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Review rate limit: 4/5 reviews remaining, refill in 12 minutes. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/query/mod.rs (1)
77-100:⚠️ Potential issue | 🟠 MajorPrevent late
SCHEMA_PROVIDERregistration from silently no-oping.
SCHEMA_PROVIDERis only consulted when schemas are registered, butQUERY_SESSIONis a process-wideLazy. If the cell is set after the firstQUERY_SESSIONaccess, the defaultGlobalSchemaProviderremains registered for the lifetime of that session, so the new extension point never takes effect for existing schemas. Please either enforce provider initialization before any session access or rebuild the session context when the provider is installed.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/query/mod.rs` around lines 77 - 100, SCHEMA_PROVIDER can be registered too late and never affect the process-wide QUERY_SESSION (and QUERY_SESSION_STATE) because QUERY_SESSION is a Lazy created with the old provider; fix by ensuring provider initialization happens before any session access or by rebuilding the session when a provider is installed: update the code that sets SCHEMA_PROVIDER to, after successful OnceCell::set, call Query::create_session_context(PARSEABLE.storage()) and replace the stored session/context (QUERY_SESSION or its InMemorySessionContext.session_context) and likewise refresh QUERY_SESSION_STATE via Query::create_session_state(...) so the new ParseableSchemaProvider takes effect for existing sessions.
🧹 Nitpick comments (2)
src/lib.rs (1)
59-64: Clarify stability expectations for these new crate-root re-exports.At Line 59, Line 60, Line 61, Line 62, Line 64, and Line 73, these
pub useadditions expand the public API surface. Please document whether they are part of a stable contract (or move them under a dedicated namespace) to avoid accidental long-term semver lock-in.Also applies to: 73-73
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lib.rs` around lines 59 - 64, These new crate-root re-exports (arrow_array, arrow_flight, arrow_ipc, catalog as parseable_catalog, datafusion, datafusion_proto) expand the public API surface; either mark them explicitly as unstable/internal or move them under a dedicated namespace/module (e.g., reexports::arrow::*) and add a clear doc-comment on each symbol indicating stability guarantees (stable API vs internal/experimental) so consumers won’t be accidentally semver-locked; update the lib.rs entries for the listed pub use items to point to the new module or add #[doc = "... stability: ..."] comments and/or cfg(feature = "internal-reexports") gating as appropriate.src/query/stream_schema_provider.rs (1)
701-701: Document the new public helper.
try_from_expris now part of the public surface. Please add rustdoc describing the accepted expression shapes and thetime_partition == Nonebehavior so custom schema providers can depend on it safely.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/query/stream_schema_provider.rs` at line 701, Add Rustdoc for the newly public helper try_from_expr in stream_schema_provider.rs: describe the accepted expression shapes (e.g., exact/match on column names, literal values, supported Expr variants) and any constraints the function assumes from Expr, and explicitly document the behavior when time_partition is None (what is returned/assumed and how partition-sensitive logic behaves). Reference the function name try_from_expr and the Expr type in the doc so external/custom schema providers know how to call it and what results or None means; keep the doc concise and include examples of expected expression shapes in prose.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/query/mod.rs`:
- Around line 211-239: The schema provider registration is using
PARSEABLE.storage().get_object_store() instead of the storage passed into
create_session_context(storage), so schemas get registered against the wrong
backend; update both branches where SCHEMA_PROVIDER.new_provider and
GlobalSchemaProvider are constructed to use the caller's storage (the function
parameter named storage) by passing Some(storage.get_object_store()) or
storage.get_object_store() and tenant_id as before, and then call
catalog.register_schema as-is so the catalog is registered against the provided
storage rather than PARSEABLE.storage().
---
Outside diff comments:
In `@src/query/mod.rs`:
- Around line 77-100: SCHEMA_PROVIDER can be registered too late and never
affect the process-wide QUERY_SESSION (and QUERY_SESSION_STATE) because
QUERY_SESSION is a Lazy created with the old provider; fix by ensuring provider
initialization happens before any session access or by rebuilding the session
when a provider is installed: update the code that sets SCHEMA_PROVIDER to,
after successful OnceCell::set, call
Query::create_session_context(PARSEABLE.storage()) and replace the stored
session/context (QUERY_SESSION or its InMemorySessionContext.session_context)
and likewise refresh QUERY_SESSION_STATE via Query::create_session_state(...) so
the new ParseableSchemaProvider takes effect for existing sessions.
---
Nitpick comments:
In `@src/lib.rs`:
- Around line 59-64: These new crate-root re-exports (arrow_array, arrow_flight,
arrow_ipc, catalog as parseable_catalog, datafusion, datafusion_proto) expand
the public API surface; either mark them explicitly as unstable/internal or move
them under a dedicated namespace/module (e.g., reexports::arrow::*) and add a
clear doc-comment on each symbol indicating stability guarantees (stable API vs
internal/experimental) so consumers won’t be accidentally semver-locked; update
the lib.rs entries for the listed pub use items to point to the new module or
add #[doc = "... stability: ..."] comments and/or cfg(feature =
"internal-reexports") gating as appropriate.
In `@src/query/stream_schema_provider.rs`:
- Line 701: Add Rustdoc for the newly public helper try_from_expr in
stream_schema_provider.rs: describe the accepted expression shapes (e.g.,
exact/match on column names, literal values, supported Expr variants) and any
constraints the function assumes from Expr, and explicitly document the behavior
when time_partition is None (what is returned/assumed and how
partition-sensitive logic behaves). Reference the function name try_from_expr
and the Expr type in the doc so external/custom schema providers know how to
call it and what results or None means; keep the doc concise and include
examples of expected expression shapes in prose.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 2b7d7002-715b-4e6c-a3c9-30b2bb3d8300
📒 Files selected for processing (4)
Cargo.tomlsrc/lib.rssrc/query/mod.rssrc/query/stream_schema_provider.rs
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/query/mod.rs (1)
319-338:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winUndefined variable
ctxwill cause a compilation error.The variable
ctxis referenced on lines 319, 322, and 338 but is never defined. Theexecutemethod callsQUERY_SESSION.get_ctx()inline on line 302 but does not assign it to a local variable. This code will not compile.🐛 Proposed fix
Add a local
ctxbinding before usage:if fields.is_empty() && !is_streaming { return Ok((Either::Left(vec![]), fields)); } + let ctx = QUERY_SESSION.get_ctx(); let plan = ctx.state().create_physical_plan(df.logical_plan()).await?; let results = if !is_streaming { let task_ctx = ctx.task_ctx();🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/query/mod.rs` around lines 319 - 338, The error is that `ctx` is used but not defined: before calling ctx.state(), ctx.task_ctx(), etc., assign the session context returned from QUERY_SESSION.get_ctx() to a local variable (e.g., let ctx = QUERY_SESSION.get_ctx()?) in the execute function so subsequent calls like ctx.state().create_physical_plan(...), collect_partitioned(plan.clone(), ctx.task_ctx().clone()), get_total_bytes_scanned(&plan), and increment_bytes_scanned_in_query_by_date(..., tenant) compile; locate the call site where QUERY_SESSION.get_ctx() is currently invoked inline (in execute) and replace it with a local binding named `ctx` used by create_physical_plan, collect_partitioned, get_total_bytes_scanned, and increment_bytes_scanned_in_query_by_date.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@src/query/mod.rs`:
- Around line 319-338: The error is that `ctx` is used but not defined: before
calling ctx.state(), ctx.task_ctx(), etc., assign the session context returned
from QUERY_SESSION.get_ctx() to a local variable (e.g., let ctx =
QUERY_SESSION.get_ctx()?) in the execute function so subsequent calls like
ctx.state().create_physical_plan(...), collect_partitioned(plan.clone(),
ctx.task_ctx().clone()), get_total_bytes_scanned(&plan), and
increment_bytes_scanned_in_query_by_date(..., tenant) compile; locate the call
site where QUERY_SESSION.get_ctx() is currently invoked inline (in execute) and
replace it with a local binding named `ctx` used by create_physical_plan,
collect_partitioned, get_total_bytes_scanned, and
increment_bytes_scanned_in_query_by_date.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 6cbe7d76-b2bf-4e69-a043-307af0a8fcc7
📒 Files selected for processing (3)
src/lib.rssrc/query/mod.rssrc/query/stream_schema_provider.rs
✅ Files skipped from review due to trivial changes (2)
- src/query/stream_schema_provider.rs
- src/lib.rs
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
src/query/mod.rs (1)
209-229:⚠️ Potential issue | 🟠 Major | ⚡ Quick winUse the caller's
storagewhen constructing schema providers.
create_session_context(storage)still registers schemas againstPARSEABLE.storage().get_object_store()in both branches. That gives callers a session state for one backend and schema providers for another whenever they pass a non-defaultObjectStorageProvider.Suggested fix
pub fn create_session_context(storage: Arc<dyn ObjectStorageProvider>) -> SessionContext { let state = Self::create_session_state(storage.clone()); + let object_store = storage.get_object_store(); let catalog = state .catalog_list() .catalog(&state.config_options().catalog.default_catalog) .expect("default catalog is provided by datafusion"); @@ for t in tenants.iter() { let schema_provider = if let Some(provider) = SCHEMA_PROVIDER.get() { provider.new_provider( - Some(PARSEABLE.storage().get_object_store()), + Some(object_store.clone()), &Some(t.to_owned()), ) } else { Box::new(GlobalSchemaProvider { - storage: PARSEABLE.storage().get_object_store(), + storage: object_store.clone(), tenant_id: Some(t.to_owned()), }) }; let _ = catalog.register_schema(t, schema_provider.into()); } @@ let schema_provider = if let Some(provider) = SCHEMA_PROVIDER.get() { - provider.new_provider(Some(PARSEABLE.storage().get_object_store()), &None) + provider.new_provider(Some(object_store.clone()), &None) } else { Box::new(GlobalSchemaProvider { - storage: PARSEABLE.storage().get_object_store(), + storage: object_store, tenant_id: None, }) };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/query/mod.rs` around lines 209 - 229, The session creation currently always uses PARSEABLE.storage().get_object_store() when constructing schema providers, causing providers to be bound to the default backend instead of the caller-provided one; update create_session_context(storage) so both branches use the passed-in storage (the function parameter named storage) when calling SCHEMA_PROVIDER.get().new_provider(...) and when constructing GlobalSchemaProvider (replace uses of PARSEABLE.storage().get_object_store() with storage.get_object_store() or the appropriate accessor), ensuring catalog.register_schema still receives the schema_provider created from the caller's storage.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/metrics/mod.rs`:
- Around line 260-270: The new IntCounterVec
TOTAL_FILES_SCANNED_IN_HOTTIER_BY_DATE is never registered with the custom
prometheus registry, so it won't be exposed; update the custom_metrics function
to call
METRICS_REGISTRY.register(Box::new(TOTAL_FILES_SCANNED_IN_HOTTIER_BY_DATE.clone()))
(or the equivalent register method used in this module) alongside the other
metrics registration, handling any register errors consistently with existing
patterns so the counter is available on /metrics.
In `@src/utils/arrow/flight.rs`:
- Around line 161-163: Replace the unbounded Flight frame size usage by defining
a single explicit constant (e.g. MAX_FLIGHT_FRAME_SIZE = 16 * 1024 * 1024) and
use it wherever FlightDataEncoderBuilder::with_max_flight_data_size currently
receives usize::MAX; update both into_flight_data_stream and into_flight_data to
call .with_max_flight_data_size(MAX_FLIGHT_FRAME_SIZE) so the Arrow IPC encoder
chunks data consistently with gRPC's frame size instead of allowing unlimited
accumulation.
---
Duplicate comments:
In `@src/query/mod.rs`:
- Around line 209-229: The session creation currently always uses
PARSEABLE.storage().get_object_store() when constructing schema providers,
causing providers to be bound to the default backend instead of the
caller-provided one; update create_session_context(storage) so both branches use
the passed-in storage (the function parameter named storage) when calling
SCHEMA_PROVIDER.get().new_provider(...) and when constructing
GlobalSchemaProvider (replace uses of PARSEABLE.storage().get_object_store()
with storage.get_object_store() or the appropriate accessor), ensuring
catalog.register_schema still receives the schema_provider created from the
caller's storage.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 435a0108-16bb-4acd-b14f-90dcc5110196
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (5)
Cargo.tomlsrc/metrics/mod.rssrc/query/mod.rssrc/query/stream_schema_provider.rssrc/utils/arrow/flight.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- src/query/stream_schema_provider.rs
| pub static TOTAL_FILES_SCANNED_IN_HOTTIER_BY_DATE: Lazy<IntCounterVec> = Lazy::new(|| { | ||
| IntCounterVec::new( | ||
| Opts::new( | ||
| "total_files_scanned_in_hottier_by_date", | ||
| "Total files scanned in hottier by date", | ||
| ) | ||
| .namespace(METRICS_NAMESPACE), | ||
| &["stream", "date", "tenant_id"], | ||
| ) | ||
| .expect("metric can be created") | ||
| }); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="src/metrics/mod.rs"
echo "Occurrences of TOTAL_FILES_SCANNED_IN_HOTTIER_BY_DATE:"
rg -nC2 'TOTAL_FILES_SCANNED_IN_HOTTIER_BY_DATE' "$FILE"
echo
echo "Registration block in custom_metrics:"
sed -n '512,580p' "$FILE"Repository: parseablehq/parseable
Length of output: 3289
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="src/metrics/mod.rs"
# Find the custom_metrics function definition and check its bounds
echo "=== custom_metrics function definition and scope ==="
sed -n '463,470p' "$FILE"
# Get the complete custom_metrics function (find where it ends)
echo
echo "=== End of custom_metrics function ==="
sed -n '575,585p' "$FILE"
# Search for any other registration attempts with the new metric
echo
echo "=== All occurrences of register and TOTAL_FILES_SCANNED_IN_HOTTIER_BY_DATE in same context ==="
rg -n 'register.*TOTAL_FILES_SCANNED_IN_HOTTIER_BY_DATE|TOTAL_FILES_SCANNED_IN_HOTTIER_BY_DATE.*register' "$FILE"
# Confirm the metric is not already registered elsewhere
echo
echo "=== Checking if the metric appears in any register calls ==="
rg -n '\.register\(' "$FILE" | grep -i 'hottier' || echo "No register calls found for hottier metric"Repository: parseablehq/parseable
Length of output: 920
Register this counter in custom_metrics.
TOTAL_FILES_SCANNED_IN_HOTTIER_BY_DATE is declared at line 260 and incremented in the helper at lines 704-706, but it is never added to METRICS_REGISTRY inside custom_metrics. With the custom registry setup in this file, the metric will not be exposed on /metrics.
Proposed fix
registry
.register(Box::new(TOTAL_QUERY_CALLS_BY_DATE.clone()))
.expect("metric can be registered");
+ registry
+ .register(Box::new(TOTAL_FILES_SCANNED_IN_HOTTIER_BY_DATE.clone()))
+ .expect("metric can be registered");
registry
.register(Box::new(TOTAL_FILES_SCANNED_IN_QUERY_BY_DATE.clone()))
.expect("metric can be registered");🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/metrics/mod.rs` around lines 260 - 270, The new IntCounterVec
TOTAL_FILES_SCANNED_IN_HOTTIER_BY_DATE is never registered with the custom
prometheus registry, so it won't be exposed; update the custom_metrics function
to call
METRICS_REGISTRY.register(Box::new(TOTAL_FILES_SCANNED_IN_HOTTIER_BY_DATE.clone()))
(or the equivalent register method used in this module) alongside the other
metrics registration, handling any register errors consistently with existing
patterns so the counter is available on /metrics.
| let flight_data_stream = FlightDataEncoderBuilder::new() | ||
| .with_max_flight_data_size(usize::MAX) | ||
| .with_options(write_options) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify all Arrow Flight encoder call sites still use unbounded size.
rg -nP --type=rust 'with_max_flight_data_size\(\s*usize::MAX\s*\)'Repository: parseablehq/parseable
Length of output: 219
🏁 Script executed:
#!/bin/bash
# Read the full context of both functions to understand their purpose and design
sed -n '146,190p' src/utils/arrow/flight.rs | cat -nRepository: parseablehq/parseable
Length of output: 2116
🏁 Script executed:
#!/bin/bash
# Search for any configuration or constants related to Flight frame sizing
rg -i 'max.*flight.*size|flight.*chunk|frame.*size' --type=rust src/Repository: parseablehq/parseable
Length of output: 365
🏁 Script executed:
#!/bin/bash
# Look for any comments or documentation about the encoding choice
sed -n '140,195p' src/utils/arrow/flight.rsRepository: parseablehq/parseable
Length of output: 1956
Bound Flight frame size instead of usize::MAX.
Both into_flight_data_stream and into_flight_data use unbounded with_max_flight_data_size(usize::MAX), which allows the Arrow IPC encoder to accumulate data excessively before chunking. Although gRPC's max_frame_size is set to 16 MB in src/handlers/airplane.rs, this creates an implicit and inefficient constraint. Define an explicit constant and use it consistently across both functions instead.
Proposed fix
+const MAX_FLIGHT_DATA_SIZE: usize = 16 * 1024 * 1024;
+
pub fn into_flight_data_stream(
stream: datafusion::execution::SendableRecordBatchStream,
) -> Result<Response<DoGetStream>, Box<Status>> {
let record_stream = stream.map_err(|e| {
arrow_flight::error::FlightError::Arrow(arrow_schema::ArrowError::ExternalError(
Box::new(e),
))
});
let write_options = IpcWriteOptions::default()
.try_with_compression(Some(arrow_ipc::CompressionType(1)))
.map_err(|err| Status::failed_precondition(err.to_string()))?;
let flight_data_stream = FlightDataEncoderBuilder::new()
- .with_max_flight_data_size(usize::MAX)
+ .with_max_flight_data_size(MAX_FLIGHT_DATA_SIZE)
.with_options(write_options)
.build(record_stream);
let flight_data_stream = flight_data_stream.map_err(|err| Status::unknown(err.to_string()));
Ok(Response::new(Box::pin(flight_data_stream) as DoGetStream))
}
pub fn into_flight_data(records: Vec<RecordBatch>) -> Result<Response<DoGetStream>, Box<Status>> {
let input_stream = futures::stream::iter(records.into_iter().map(Ok));
let write_options = IpcWriteOptions::default()
.try_with_compression(Some(arrow_ipc::CompressionType(1)))
.map_err(|err| Status::failed_precondition(err.to_string()))?;
let flight_data_stream = FlightDataEncoderBuilder::new()
- .with_max_flight_data_size(usize::MAX)
+ .with_max_flight_data_size(MAX_FLIGHT_DATA_SIZE)
.with_options(write_options)
// .with_schema(schema.into())
.build(input_stream);
let flight_data_stream = flight_data_stream.map_err(|err| Status::unknown(err.to_string()));
Ok(Response::new(Box::pin(flight_data_stream) as DoGetStream))
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let flight_data_stream = FlightDataEncoderBuilder::new() | |
| .with_max_flight_data_size(usize::MAX) | |
| .with_options(write_options) | |
| const MAX_FLIGHT_DATA_SIZE: usize = 16 * 1024 * 1024; | |
| pub fn into_flight_data_stream( | |
| stream: datafusion::execution::SendableRecordBatchStream, | |
| ) -> Result<Response<DoGetStream>, Box<Status>> { | |
| let record_stream = stream.map_err(|e| { | |
| arrow_flight::error::FlightError::Arrow(arrow_schema::ArrowError::ExternalError( | |
| Box::new(e), | |
| )) | |
| }); | |
| let write_options = IpcWriteOptions::default() | |
| .try_with_compression(Some(arrow_ipc::CompressionType(1))) | |
| .map_err(|err| Status::failed_precondition(err.to_string()))?; | |
| let flight_data_stream = FlightDataEncoderBuilder::new() | |
| .with_max_flight_data_size(MAX_FLIGHT_DATA_SIZE) | |
| .with_options(write_options) | |
| .build(record_stream); | |
| let flight_data_stream = flight_data_stream.map_err(|err| Status::unknown(err.to_string())); | |
| Ok(Response::new(Box::pin(flight_data_stream) as DoGetStream)) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/utils/arrow/flight.rs` around lines 161 - 163, Replace the unbounded
Flight frame size usage by defining a single explicit constant (e.g.
MAX_FLIGHT_FRAME_SIZE = 16 * 1024 * 1024) and use it wherever
FlightDataEncoderBuilder::with_max_flight_data_size currently receives
usize::MAX; update both into_flight_data_stream and into_flight_data to call
.with_max_flight_data_size(MAX_FLIGHT_FRAME_SIZE) so the Arrow IPC encoder
chunks data consistently with gRPC's frame size instead of allowing unlimited
accumulation.
Fixes #XXXX.
Description
This PR has:
Summary by CodeRabbit
New Features
Improvements